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Abstract  -  Many  studies  for  computer-based  chromosome  anal  y- 
sis  have  shown  that  it  is  possible  to  classify  chromosomes  into  24 
subgroups.  In  addition,  artificial  neural  network  (ANN)  has 
been  adopted  for  the  human  chromosome  classification.  It  is 
important  to  select  optimum  features  for  training  neural  net¬ 
work  classifier.  We  selected  some  features  -  relative  length, 
normalized  density  profile  (d.p)  and  centromeric  index  -  used  to 
identify  chromosomes  and  trained  neural  network  classifier 
changing  the  number  of  samples  which  used  to  get  d.p.  We 
found  the  fact  that  the  classification  error  showed  to  be  mini¬ 
mum  when  this  number  was  equal  to  or  greater  than  the  length 
of  No.l  human  chromosome. 
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This  research  focused  on  the  feature  selection  to  minimize 
the  classification  error. 

n.  Methodology 

A.  Chromosome  Data 

The  suggested  methodology  is  applied  to  Edinburgh  data¬ 
base.  The  chromosome  database  was  provided  by  Dr.  Piper 
and  exported  from  the  Medical  Research  Council  Human 
Genetic  Unit,  Edinburgh,  UK. 

B.  Chromosome  Feature  Extraction 


I.  INTRODUCTION 

Human  chromosome  analysis  is  an  essential  task  in  cytoge¬ 
netics,  especially  in  prenatal  screening  and  genetic  syndrome 
diagnosis,  cancer  pathology  research  and  environmentally 
induced  mutagen  dosimetry  [3].  Cells  used  for  chromosome 
analysis  are  taken  mostly  from  amniotic  fluid  or  blood  sam¬ 
ples.  The  stage  at  which  the  chromosomes  are  most  suitable 
for  analysis  is  the  metaphase.  One  of  the  aims  of  chromo¬ 
some  analysis  is  the  creation  of  a  karyotype,  which  is  a  layout 
of  chromosome  images  organized  by  decreasing  size  in  pairs. 
The  karyotype  is  obtained  by  all  of  procedures  for  cell  culture, 
preparing  slides,  selection  of  best-observable  chromosome 
image,  analysis  and  classification.  However,  even  today, 
chromosome  analysis  and  karyotyping  are  manually  per¬ 
formed  in  most  cytogenetics  laboratories  in  a  repetitive,  time 
consuming  and  therefore  expensive  procedure  [2]. 

Therefore,  automatic  chromosome  analysis  has  attracted 
much  attention.  So  efforts  to  develop  automatic  chromosome 
classification  techniques  has  been  made  during  the  last  20 
years.  ANN  is  suitable  for  automatic  chromosome  classifica¬ 
tion  because  the  human  chromosome  images  (see  Fig.  1)  have 
nonlinear  properties.  The  purpose  of  this  study  is  to  find  the 
best  features  for  neural  network  classifiers. 
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Fig.  1  The  human  chromosome  images 


We  got  the  skeleton  of  a  chromosome  by  applying  the 
medial  axis  transformation  (MAT).  The  MAT  is  widely  used 
as  a  convenient  transformation  for  the  representation  of  elon¬ 
gated  objects,  for  example  in  character  recognition  or  chro¬ 
mosome  analysis,  where  the  width  of  the  objects  contains 
little  useful  information  [5].  In  addition,  we  used  the  thinning 
algorithm  that  iteratively  deletes  edge  points  of  a  region  sub¬ 
ject  to  the  constraints  that  deletion  of  these  points  does  not 
remove  end  points,  does  not  break  connectedness,  and  does 
not  cause  excessive  erosion  of  the  region  [4], 

1)  Relative  length'.  One  of  the  significant  morphological 
features  used  to  identify  a  chromosome  is  a  length  character¬ 
istic  [1],  After  applying  MAT,  we  extended  the  skeletonized 
line  to  the  boundary  and  got  the  length  of  the  chromosome 
(the  number  of  pixel  in  the  skeletonized  line).  Because  length 
varies  according  to  the  phase  of  the  cell  division,  the  length 
must  be  normalized.  The  relative  length  of  the  r'-th  chromo¬ 
some  (Iri)  can  be  obtained  by  normalizing  the  medial  axis 
length  using  the  following  equation: 

/„•=  7-  (1) 

t 

where  /,•  ( i  =  1,2,. ..,24)  is  the  length  of  i-th  chromosome  and 
1 1  is  the  total  length  of  all  46  chromosomes  of  one  cell. 

2)  Centromeric  index:  The  centromeric  index  (C.I.)  is  the 
ratio  of  the  length  of  the  short  arm  to  the  whole  length  of  a 
chromosome.  It  is  another  significant  morphological  feature 
used  to  identify  the  chromosome. 

short  arm  length 

C,  = - - -  (2) 

whole  length  of  medial  axis 

where  C ,  is  the  C.I.  This  index,  which  indicates  the  location 
of  the  centromere  of  the  chromosome,  is  obtained  from  the 
shape  profile.  The  shape  profile  for  a  chromosome  is  obtained 
by  measuring  the  width  along  a  transverse  line,  perpendicular 
to  the  tangent  of  the  medial  axis  and  centered  at  unit  distance 
along  the  medial  axis  [1], 
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3)  Normalized  density  profile:  The  Giemsa-stained  human 
chromosome  has  a  sequence  of  banding  pattern  that  is  per¬ 
pendicular  to  the  medial  axis  of  the  chromosome.  Density 
profile  is  a  one-dimensional  graph  of  the  banding  pattern 
property  of  the  chromosome  computed  at  a  sequence  of 
points  along  the  possibly  curved  chromosome  medial  axis. 
The  density  profile  for  a  chromosome  is  obtained  from 
measurements  made  along  a  transverse  line,  perpendicular  to 
the  tangent  of  the  medial  axis.  Each  profile  value  (/,)  results 
from  the  summing  properties  of  points  spaced  at  unit  distance 
apart  along  each  transverse  line.  To  reduce  the  variation  of 
the  density  values  due  to  the  different  cell  culturing 
conditions,  this  density  profile  is  normalized  three  times  by 
three  different  methods.  First,  this  profile  ( dw(i ))  is 
normalized  in  the  direction  of  the  perpendicular  to  the  medial 
axis  of  the  chromosome  to  reduce  the  variation  of  width  of  a 
chromosome.  m- 1 

/,  =Y,d(i,f)  (i=0,l,...,n-l)  (3) 

;= o 

where  m  is  the  number  of  pixel  on  the  line  of  perpendicular  to 
the  tangent  of  the  medial  axis  and  d(ij)  is  the  pixel  value  on 
the  line  of  perpendicular  to  the  tangent  of  the  medial  axis. 

dw(i)  =  ~ ^  (i=0,l,...,n-l)  (4) 

where  w(i)  is  the  width  of  /-th  point  in  a  chromosome. 

Next,  histogram  equalization  is  applied  to  the  profile  to 
reduce  the  effect  of  nonhomogeneous  illumination  conditions 
of  the  microscope.  Finally,  the  profile  is  normalized  along  the 
medial  axis  to  obtain  the  same  number  of  normalized  density 
values  (dff  i))  regardless  of  chromosome  length  [1], 

dN  (i)  =  d"  (l)~d"'MiN<l)  (i=0,l,...,n-l)  (5) 

“  wMAX  © 

where  dWMidf)  is  the  minimum  density  value  and  dWMAx(i )  is 
maximum. 

In  this  study,  five  normalized  density  profiles  were  ob¬ 
tained  and  were  compared  the  classification  error  to  each  case. 

C.  Chromosome  Data  Sets 

Five  data  sets  were  prepared  to  select  the  best  number  of 
density  value  for  one  input  pattern.  These  sets  were  the  same 
(relative  length  +  C.I  +  d.p)  except  for  the  number  of  density 
value.  Each  training  set  consisted  of  460  input  patterns,  ex¬ 
tracted  from  460  chromosomes. 

In  addition  to  training  sets,  test  sets  were  prepared  to  test 
the  trained  neural  network  classifier.  Each  test  set  had  the 
same  number  of  input  data  features  as  training  set,  but  the 
features  were  extracted  from  chromosomes  that  were  not  used 
to  prepare  the  training  sets. 

D.  Neural  Network  Classifier 

In  this  study,  ANN  as  a  classifier  was  examined.  A  two- 
layer  neural  network  with  an  error  backpropagation  training 
algorithm  was  adopted  in  this  study.  The  neural  network 
classifier  was  implemented  by  software,  in  which  the  number 
of  output  nodes  was  fixed  at  24,  but  the  number  of  input 


output  nodes  was  fixed  at  24,  but  the  number  of  input  nodes, 
the  number  of  processing  elements  in  the  hidden  layer,  the 
number  of  learning  times,  learning  constant,  momentum  term, 
and  upper  bound  of  error  value  were  programmable. 

m.  results 

We  trained  each  neural  network  classifier  with  each  fea¬ 
ture  set  (25,  50,  70,  80,  and  100  density  values  including  C.I 
and  relative  length).  We  applied  the  test  sets  to  the  trained 
neural  network  and  obtained  each  classification  error  shown 
in  Table  I. 

The  error  showed  to  be  minimum  when  the  number  of  input 
node  was  102  including  relative  length,  C.I  and  100  density 
values. 

IV.  DISCUSSION 

We  selected  the  features  -  relative  length,  C.I  and  d.p  -  to 
identify  a  chromosome  and  examined  trained  sets  only  chang¬ 
ing  the  number  of  the  density  value  to  obtain  the  best  number 
of  density  value  of  a  chromosome.  However,  other  features 
were  not  used  in  this  study.  We  need  to  study  to  get  other 
chromosome  features  that  reduce  the  classification  error. 

V.  Conclusion 

We  found  the  fact  that  the  classification  error  showed  to 
be  minimum  when  the  number  of  density  values  was  equal  to 
or  greater  than  the  length  of  No.l  human  chromosome.  The 
two-layer  neural  network  trained  with  the  error  backpropaga¬ 
tion  training  algorithm  showed  good  potential  in  classifica¬ 
tion  of  Giemsa-stained. 

TABLE  I 


CLASSIFICATION  ERROR 


The  number  of  density  value  \ 

25 

50 

70 

80 

100 

Error  (%) 

10.2 

7.2 

5.8 

5.1 

3.6 
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